Techniques for Improving the Cache Performance in Parallel Applications
نویسندگان
چکیده
The performance of parallel programs has suffered from memory access latencies induced by cache misses. In this paper, to investigate the causes of these cache misses, data parallel applications were executed on shared memory multiprocessors. The experiment showed that cache conflict misses occupied most of the cache misses. This was due to the cross interference among the grains composed of the part of data arrays. To address this problem, a tailored grain size was devised from the underlying cache architecture. Besides the interference among grains, cache performance was sensitive to the way data were constructed. To make data structure for exhibiting good cache behavior, a stride merging-arrays method was presented. This method entailed the reduction of cache conflict misses and reduced the useless prefetches in cache lines with multiple words. Simulation results show that these techniques may enhance the performance of parallel applications due to the improved cache performance.
منابع مشابه
Performance Coupling: Case Studies for Improving the Performance of Scientific Applications
Traditional performance optimization techniques have focused on nding the kernel in an application that is the most time consuming and attempting to optimize it. In this paper we focus on an optimization technique with a more global perspective of the application. In particular, we present a methodology for measuring the interaction, or coupling, between kernels within an application and descri...
متن کاملImprove Replica Placement in Content Distribution Networks with Hybrid Technique
The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...
متن کاملMulti Level Caching and Anticipated Parallel Processing-Based Algorithm for Improving the Performance of the Distributed File System
Large amount of data is getting generated due to the extensive use of web applications by billions of users around the globe. The organizations which has deployed web applications are pondering over solutions for scalable storage and faster access of large data. Distributed file systems (DFSs) have been emerged as efficient storage solutions so that the data can be stored and accessed efficient...
متن کاملReview of techniques for improving the uniformity of dose distribution in total body irradiation (TBI) with parallel – opposed anterior and posterior geometry
Total body irradiation (TBI) is a kind of external beam radiotherapy which is used in conjunction with chemotherapy with the purpose of immunosuppression before bone marrow transplantation. As recommended by AAPM dose distribution uniformity in TBI is very important and dose variation must be within ±10% of prescription dose. Patients treatment geometry for TBI techniques fall into two co...
متن کاملImproving Performance for Software MPEG Players
In this paper, we present a technique for improving the cache memory performance for software MPEG players. We motivate this technique by first presenting a characterization of cache behavior for mpeg play and mpeg2play MPEG applications. We then apply two hardware-based prefetching techniques to improve the cache memory performance. Previously published work has focused on applications of pref...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999